perf: chunk long string byte escaping by He-Pin · Pull Request #809 · databricks/sjsonnet

He-Pin · 2026-04-30T05:09:20Z

Motivation:

Split the JMH-positive, JDK17/JIT/GC-friendly long-string rendering piece out of #776. The original PR mixed renderer, stdlib, format, compareStrings, and Scala Native changes, and Native hyperfine was not clean enough to merge as one large PR.

Key Design Decision:

Keep this PR focused on byte-rendering long strings that contain JSON escapes. This PR does not include compareStrings, char materializer, stdlib asciiSafe/substr/join, or format char-array assembly changes from #776.

Modification:

Add CharSWAR.findFirstEscapeChar(byte[], from, to) on JVM, Scala.js, and Scala Native.
In BaseByteRenderer, keep the existing UTF-8 byte array for long strings, locate escape bytes, bulk-copy clean chunks with System.arraycopy, and escape only the matching bytes inline.
Precompute the exact escaped output length before writing dirty long strings so ByteBuilder does not grow repeatedly.

JDK17 / JIT / GC Notes:

Straight byte-array loops and System.arraycopy; no reflection or internal JDK APIs.
Reuses the existing UTF-8 byte-array allocation from master; no extra temporary arrays beyond the existing long-string encoding.
Clean long strings stay on the same bulk-copy fast path.
Dirty long strings avoid falling back to whole-string char escaping.
The JDK17 API that looks tempting here, HexFormat, is intentionally not used because per-control-char formatting would be a worse JIT/GC shape than the static hex table.

Focused Target JMH + GC:

JMH ran with the project compiled at the JDK17 level using the current Mill toolchain. Command shape:

./mill --no-server bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path=... -wi 3 -i 5 -r 3s -w 2s -f 1 -prof gc

Benchmark	master ms/op	PR ms/op	Delta	master alloc B/op	PR alloc B/op	GC note
`large_string_template`	1.686 +/- 0.027	1.398 +/- 0.464	-17.1%	7,775,106	7,774,803	allocation neutral/slightly lower
`large_string_join`	0.637 +/- 0.075	0.646 +/- 0.025	neutral	1,530,343	1,530,269	clean path neutral

Full JMH + GC Sweep:

All 36 regression benchmark inputs were covered. The full sweep command used:

./mill --no-server bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path="$PATHS" -wi 3 -i 5 -r 2s -w 1s -f 1 -prof gc -rf json

bench.07 needs the same larger stack that bench.runRegressions normally provides, so it was rerun separately with -jvmArgsAppend -Xss100m. The full sweep is a screening run, not a claim that this renderer-only PR improves unrelated stdlib/parser cases; several unrelated rows had obvious system/JIT outliers. There were no clear time regressions by JMH error interval overlap.

Benchmark	master ms/op	PR ms/op	Delta	Alloc delta
`assertions`	0.205	0.209	+1.8%	+1.11%
`bench.01`	0.052	0.048	-7.4%	+2.47%
`bench.02`	28.057	26.910	-4.1%	-0.00%
`bench.03`	7.048	7.244	+2.8%	+0.00%
`bench.04`	0.116	0.118	+1.9%	-0.01%
`bench.06`	0.578	0.217	-62.5%	+0.32%
`bench.07`	2.754	2.465	-10.5%	-0.00%
`bench.08`	0.956	0.038	-96.0%	-4.24%
`bench.09`	0.332	0.044	-86.8%	+2.52%
`gen_big_object`	2.697	0.803	-70.2%	-0.06%
`large_string_join`	0.588	0.584	-0.8%	-0.04%
`large_string_template`	1.631	1.260	-22.8%	-0.01%
`realistic1`	1.447	1.610	+11.3%	+0.00%
`realistic2`	43.015	42.317	-1.6%	+0.00%
`base64`	0.156	0.151	-3.4%	+0.00%
`base64Decode`	0.125	0.118	-5.4%	+0.00%
`base64DecodeBytes`	5.348	5.228	-2.2%	-0.02%
`base64_byte_array`	0.851	0.775	-8.9%	-0.00%
`base64_stress`	0.192	0.177	-8.0%	-0.01%
`comparison`	0.070	0.033	-53.3%	-0.12%
`comparison2`	143.021	44.546	-68.9%	-0.02%
`escapeStringJson`	0.798	0.057	-92.8%	-1.96%
`foldl`	0.091	0.101	+10.6%	+0.28%
`lstripChars`	0.122	0.114	-6.4%	+0.01%
`manifestJsonEx`	1.035	0.052	-95.0%	-4.31%
`manifestTomlEx`	1.128	0.068	-94.0%	-1.92%
`manifestYamlDoc`	0.057	0.056	-1.1%	-0.92%
`member`	0.660	0.639	-3.1%	-0.00%
`parseInt`	0.084	0.032	-61.7%	-0.09%
`reverse`	34.736	6.770	-80.5%	-0.03%
`rstripChars`	0.122	0.122	-0.3%	-0.01%
`stripChars`	0.129	0.115	-10.9%	+0.01%
`substr`	0.060	0.056	-6.4%	+0.01%
`setDiff`	0.418	0.392	-6.1%	-0.07%
`setInter`	0.358	0.350	-2.4%	-0.13%
`setUnion`	0.625	0.622	-0.5%	+0.04%

Focused Rechecks:

Rows that looked suspicious in the full sweep were rerun with longer settings. The only stable allocation concern from the raw table was bench.09; a 3-fork rerun made it neutral/slightly lower. bench.06 was also rerun because the raw full sweep showed a small allocation delta.

Benchmark	master ms/op	PR ms/op	Delta	Alloc delta
`bench.06`	0.219	0.215	-1.7%	+0.05%
`bench.09`	0.042	0.042	-1.2%	-0.36%

Scala Native Hyperfine:

Native artifacts were built with ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' on both master and this branch. Each hyperfine command loops 20 CLI invocations, uses --warmup 5 --runs 60, and the table divides the reported mean/median back to per-invocation milliseconds.

Benchmark	master mean ms	PR mean ms	Delta	master median ms	PR median ms	Median delta
`large_string_template`	11.60 +/- 0.98	10.30 +/- 0.82	-11.3%	11.32	9.95	-12.1%
`large_string_join`	6.01 +/- 0.12	6.02 +/- 0.16	neutral	5.98	5.98	neutral

Correctness Review:

visitLongString is only used for String values when escapeUnicode = false, matching the existing ByteRenderer path.
UTF-8 continuation bytes are always >= 0x80, so scanning the UTF-8 byte array for ", \, or < 0x20 cannot falsely match inside a non-ASCII code point.
Control characters U+0000 through U+001F remain single-byte UTF-8 and are emitted as the same JSON escapes as the old RenderUtils.escapeByte fallback.
Buffer safety was rechecked: every chunk copy reads elemBuilder.arr after ensureLength, so a grow cannot leave a stale array reference.
JVM and Native CLI parity checks against master passed for long clean ASCII, long non-ASCII, quote/backslash, and control-character mixed strings.

Rejected Splits From #776:

Format.scala char-array assembly: not JMH-positive on current master.
length/substr/asciiSafe/join group: substr regressed, so it should not be split out as-is.
std.join exact-capacity builder: allocation improved in one run, but no-prof JMH regressed.
compareStrings/SWAR group: too broad and not GC-proven for a focused first split.
JVM String.indexOf escape scan: tiny signal only, not enough for a separate PR.

Verification:

./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
./mill --no-server 'sjsonnet.jvm[3.3.7].test'
./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'
Full JMH+GC sweep over all 36 regression benchmark inputs
Focused JMH+GC rechecks for suspicious full-sweep rows
Native hyperfine commands above
JVM/Native output parity checks against master for long string escape edge cases

References:

Split from perf: comprehensive Scala Native render pipeline optimization #776
Base: 3a9a492899420456070fb84eaa5b89f8b7dfe1bf
Head: ed9af56139ad6a066910483104bae165cef53d16

Motivation: Split the JMH-positive long-string rendering piece out of databricks#776 without carrying over the broader Scala Native render-pipeline experiment. Modification: - Add CharSWAR.findFirstEscapeChar for byte arrays on JVM, JS, and Native. - Keep the existing UTF-8 byte array for long strings, but locate escape bytes and copy clean chunks with System.arraycopy. - Escape only the matching bytes inline. - Precompute the exact escaped output length before writing dirty strings so ByteBuilder does not grow repeatedly. Result: This keeps the change JDK17/JIT/GC friendly: straight byte-array loops, no internal JDK APIs, no extra temporary arrays beyond the existing UTF-8 encoding, and no regression on clean long strings.

He-Pin mentioned this pull request Apr 30, 2026

perf: comprehensive Scala Native render pipeline optimization #776

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: chunk long string byte escaping#809

perf: chunk long string byte escaping#809
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:split/pr776-byte-chunked-escape

He-Pin commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 30, 2026 •

edited

Loading